pac-bayesian bound
PAC-Bayesian Bound for the Conditional Value at Risk
Conditional Value at Risk ($\textsc{CVaR}$) is a ``coherent risk measure'' which generalizes expectation (reduced to a boundary parameter setting). Widely used in mathematical finance, it is garnering increasing interest in machine learning as an alternate approach to regularization, and as a means for ensuring fairness. This paper presents a generalization bound for learning algorithms that minimize the $\textsc{CVaR}$ of the empirical loss. The bound is of PAC-Bayesian type and is guaranteed to be small when the empirical $\textsc{CVaR}$ is small. We achieve this by reducing the problem of estimating $\textsc{CVaR}$ to that of merely estimating an expectation. This then enables us, as a by-product, to obtain concentration inequalities for $\textsc{CVaR}$ even when the random variable in question is unbounded.
On the Importance of Gradient Norm in PAC-Bayesian Bounds
Generalization bounds which assess the difference between the true risk and the empirical risk have been studied extensively. However, to obtain bounds, current techniques use strict assumptions such as a uniformly bounded or a Lipschitz loss function. To avoid these assumptions, in this paper, we follow an alternative approach: we relax uniform bounds assumptions by using on-average bounded loss and on-average bounded gradient norm assumptions. Following this relaxation, we propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities. These inequalities add an additional loss-gradient norm term to the generalization bound, which is intuitively a surrogate of the model complexity. We apply the proposed bound on Bayesian deep nets and empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
Review for NeurIPS paper: PAC-Bayesian Bound for the Conditional Value at Risk
Conditional Value at Risk or Expected Shortfall (CVaR) of a random variable is the expected value of this random variable conditionally on the fact that this random variable exceeds a given value. As example, it quantifies the amount of tail risk an investment portfolio has. This kind of value is of importance in many situations and is getting more attention in the ML community. Indeed, a learned predictor that has a bad accuracy might nevertheless be of high utility if it get some a high CVaR provided there is a particular interest for examples that are in the best quantile (e.g., best drivers for a car insurance compagnies ... the only ones that should qualify for a reduction of their insurance quotes. There is still a lot to understand on CVaR from the learning theory point of view, this paper proposes the first known PAC-Bayesian bound for CVaR.
Multi-View Majority Vote Learning Algorithms: Direct Minimization of PAC-Bayesian Bounds
Hennequin, Mehdi, Zitouni, Abdelkrim, Benabdeslem, Khalid, Elghazel, Haytham, Gaci, Yacine
The PAC-Bayesian framework has significantly advanced the understanding of statistical learning, particularly for majority voting methods. Despite its successes, its application to multi-view learning -- a setting with multiple complementary data representations -- remains underexplored. In this work, we extend PAC-Bayesian theory to multi-view learning, introducing novel generalization bounds based on R\'enyi divergence. These bounds provide an alternative to traditional Kullback-Leibler divergence-based counterparts, leveraging the flexibility of R\'enyi divergence. Furthermore, we propose first- and second-order oracle PAC-Bayesian bounds and extend the C-bound to multi-view settings. To bridge theory and practice, we design efficient self-bounding optimization algorithms that align with our theoretical results.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)
- (6 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)
PAC-Bayesian Bound for the Conditional Value at Risk
Conditional Value at Risk ( \textsc{CVaR}) is a coherent risk measure'' which generalizes expectation (reduced to a boundary parameter setting). Widely used in mathematical finance, it is garnering increasing interest in machine learning as an alternate approach to regularization, and as a means for ensuring fairness. This paper presents a generalization bound for learning algorithms that minimize the \textsc{CVaR} of the empirical loss. The bound is of PAC-Bayesian type and is guaranteed to be small when the empirical \textsc{CVaR} is small. We achieve this by reducing the problem of estimating \textsc{CVaR} to that of merely estimating an expectation.
On the Importance of Gradient Norm in PAC-Bayesian Bounds
Generalization bounds which assess the difference between the true risk and the empirical risk have been studied extensively. However, to obtain bounds, current techniques use strict assumptions such as a uniformly bounded or a Lipschitz loss function. To avoid these assumptions, in this paper, we follow an alternative approach: we relax uniform bounds assumptions by using on-average bounded loss and on-average bounded gradient norm assumptions. Following this relaxation, we propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities. These inequalities add an additional loss-gradient norm term to the generalization bound, which is intuitively a surrogate of the model complexity. We apply the proposed bound on Bayesian deep nets and empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
PAC-Bayesian Bound for the Conditional Value at Risk
Mhammedi, Zakaria, Guedj, Benjamin, Williamson, Robert C.
Conditional Value at Risk (CVaR) is a family of "coherent risk measures" which generalize the traditional mathematical expectation. Widely used in mathematical finance, it is garnering increasing interest in machine learning, e.g., as an alternate approach to regularization, and as a means for ensuring fairness. This paper presents a generalization bound for learning algorithms that minimize the CVaR of the empirical loss. The bound is of PAC-Bayesian type and is guaranteed to be small when the empirical CVaR is small. We achieve this by reducing the problem of estimating CVaR to that of merely estimating an expectation. This then enables us, as a by-product, to obtain concentration inequalities for CVaR even when the random variable in question is unbounded.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Nevada (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (3 more...)
On PAC-Bayesian Bounds for Random Forests
Lorenzen, Stephan Sloth, Igel, Christian, Seldin, Yevgeny
Existing guarantees in terms of rigorous upper bounds on the generalization error for the original random forest algorithm, one of the most frequently used machine learning methods, are unsatisfying. We discuss and evaluate various PAC-Bayesian approaches to derive such bounds. The bounds do not require additional hold-out data, because the out-of-bag samples from the bagging in the training process can be exploited. A random forest predicts by taking a majority vote of an ensemble of decision trees. The first approach is to bound the error of the vote by twice the error of the corresponding Gibbs classifier (classifying with a single member of the ensemble selected at random). However, this approach does not take into account the effect of averaging out of errors of individual classifiers when taking the majority vote. This effect provides a significant boost in performance when the errors are independent or negatively correlated, but when the correlations are strong the advantage from taking the majority vote is small. The second approach based on PAC-Bayesian C-bounds takes dependencies between ensemble members into account, but it requires estimating correlations between the errors of the individual classifiers. When the correlations are high or the estimation is poor, the bounds degrade. In our experiments, we compute generalization bounds for random forests on various benchmark data sets. Because the individual decision trees already perform well, their predictions are highly correlated and the C-bounds do not lead to satisfactory results. For the same reason, the bounds based on the analysis of Gibbs classifiers are typically superior and often reasonably tight. Bounds based on a validation set coming at the cost of a smaller training set gave better performance guarantees, but worse performance in most experiments.
- Europe > France (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)
A PAC-Bayesian bound for Lifelong Learning
Pentina, Anastasia, Lampert, Christoph H.
Transfer learning has received a lot of attention in the machine learning community over the last years, and several effective algorithms have been developed. However, relatively little is known about their theoretical properties, especially in the setting of lifelong learning, where the goal is to transfer information to tasks for which no data have been observed so far. In this work we study lifelong learning from a theoretical perspective. Our main result is a PAC-Bayesian generalization bound that offers a unified view on existing paradigms for transfer learning, such as the transfer of parameters or the transfer of low-dimensional representations. We also use the bound to derive two principled lifelong learning algorithms, and we show that these yield results comparable with existing methods.
- Research Report (0.82)
- Instructional Material (0.77)